Instructions Scheduling for Highly Super-scalar Processors

نویسندگان

Bill Appelbe

Raja Das

چکیده

Super-scalar processors can execute multiple instructions out-of-order per cycle and speculatively execute instructions through branches. Such processors invalidate many of the assumptions of traditional instruction scheduling. This article analyzes the impact of super-scalar processor architecture upon instruction scheduling. The compile-time schedule is shown to signiicantly impact performance, despite out-of-order execution. The problem of determining an optimal schedule at compile-time is shown to be NP-complete. A variety of heuristics for instructions scheduling are applied to benchmarks, and it is shown that traditional depth-rst instruction scheduling performs badly compared to a variety of breadth-rst instruction scheduling heuristics. Modern high-performance microprocessors, such as the HP PA-8000012] or the PowerPC 62004], are all super-scalar, meaning that they can simultaneously fetch, decode, and execute several instructions at once. Current processors are often at least 4-way super-scalar. Together with the ability to execute multiple instructions in a single cycle, such highly super-scalar processors usually have several other characteristics that signiicantly impact compiler optimization and instruction scheduling, including: Run-time Instruction Scheduling After instructions have been fetched and decoded, they are placed in an instruction reorder buuer awaiting execution. Instructions are executed out-of-order, meaning that an instruction in the instruction buuer is scheduled for execution when all of its operands are available. Run-time Register Renaming Most instruction sets are limited to about 32 real registers. Reuse of registers thus creates anti-dependences: a write into a register can connict with a preceding read. Register renaming eliminates such dependencies by internally mapping instruction set registers to a larger set of virtual registers. Each time a register is written, it is dynamically assigned a new virtual register, thus converting the program dynamically into a single-assignment form. Speculative Execution The direction of conditional branches can be predicted at run-time, using a cache of previous branch directions. If the outcome of a conditional branch prediction is incorrect, all instructions subsequent to the branch are re-executed. Modern processors can speculatively predict through many branches and recover from an incorrect prediction in a single cycle. between register accesses when instructions are decoded. However, dependences between memory access may not be known until the eeective address of the LOAD and STORE instructions are calculated at run-time. This would seem to imply that LOAD and STORE instructions must be executed in order. However, LOADs and STOREs can speculatively execute out of order, provided that such instructions are committed to memory in order, and runtime dependences are detected …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Register Assignment Strategies on Out - of - order IssueSuperscalar Processors

Register allocation impacts the performance of programs on ne-grain parallel architectures. The al-locator can introduce dependences that did not exist among the original code sequence, preventing instructions from being executed in parallel. What is not well-understood is the impact of the allocation on code targeted for an out-of-order issue, super-scalar architecture. The dynamic scheduling ...

متن کامل

Understanding the Impact of Inter-Thread Cache Interference on ILP in Modern SMT Processors

Simultaneous Multithreading (SMT) has emerged as an effective method of increasing utilization of resources in modern super-scalar processors. SMT processors increase instruction-level parallelism (ILP) and resource utilization by simultaneously executing instructions from multiple independent threads. Although simultaneously sharing resources benefits system throughput, coscheduled threads oft...

متن کامل

Using Conditional Execution to Exploit Instruction Level Concurrency

Multiple-instruction-issue processors seek to improve performance over scalar RISC processors by providing multiple pipelined functional units in order to fetch, decode and execute several instructions per cycle. The process of identifying instructions which can be executed in parallel and distributing them between the available functional units is referred to as instruction scheduling. This pa...

متن کامل

Optimization in Parallelizing Compilers An Introduction to the Minitrack

Parallel processing is at a critical point of its evolution. After a long period of intense support by government and academia, it slowly moves to derive the bulk of its support from the commercial world. Such a move brings with it a change of emphasis from record breaking performance to price/performance and sustained speed of program execution. The winning architectures are not only fast but ...

متن کامل

Initial Results on the Performance and Costof Vector

Increasinglywider superscalarprocessors are experiencingdi-minishing performance returns while requiring larger portions of die area dedicated to control rather than datapath. As an alternative to using these processors to exploit parallelismeeectively, we are investigating the viability of using single-chip vector microprocessors. This paper presents some initial results of our investigation w...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1997

Instructions Scheduling for Highly Super-scalar Processors

نویسندگان

چکیده

منابع مشابه

Evaluating Register Assignment Strategies on Out - of - order IssueSuperscalar Processors

Understanding the Impact of Inter-Thread Cache Interference on ILP in Modern SMT Processors

Using Conditional Execution to Exploit Instruction Level Concurrency

Optimization in Parallelizing Compilers An Introduction to the Minitrack

Initial Results on the Performance and Costof Vector

عنوان ژورنال:

اشتراک گذاری